An End-to-End Spatio-Temporal Attention Model for Human Action Recognition from Skeleton Data
نویسندگان
چکیده
Human action recognition is an important task in computer vision. Extracting discriminative spatial and temporal features to model the spatial and temporal evolutions of different actions plays a key role in accomplishing this task. In this work, we propose an end-to-end spatial and temporal attention model for human action recognition from skeleton data. We build our model on top of the Recurrent Neural Networks (RNNs) with Long Short-Term Memory (LSTM), which learns to selectively focus on discriminative joints of skeleton within each frame of the inputs and pays different levels of attention to the outputs of different frames. Furthermore, to ensure effective training of the network, we propose a regularized cross-entropy loss to drive the model learning process and develop a joint training strategy accordingly. Experimental results demonstrate the effectiveness of the proposed model, both on the small human action recognition dataset of SBU and the currently largest NTU dataset.
منابع مشابه
Enhanced skeleton visualization for view invariant human action recognition
Human action recognition based on skeletons has wide applications in human–computer interaction and intelligent surveillance. However, view variations and noisy data bring challenges to this task. What’s more, it remains a problem to effectively represent spatio-temporal skeleton sequences. To solve these problems in one goal, this work presents an enhanced skeleton visualization method for vie...
متن کاملAction Recognition with Visual Attention on Skeleton Images
Action recognition with 3D skeleton sequences is becoming popular due to its speed and robustness. The recently proposed Convolutional Neural Networks (CNN) based methods have shown good performance in learning spatio-temporal representations for skeleton sequences. Despite the good recognition accuracy achieved by previous CNN based methods, there exist two problems that potentially limit the ...
متن کاملSkeleton-Based Action Recognition Using Spatio-Temporal LSTM Network with Trust Gates
Skeleton-based human action recognition has attracted a lot of research attention during the past few years. Recent works attempted to utilize recurrent neural networks to model the temporal dependencies between the 3D positional configurations of human body joints for better analysis of human activities in the skeletal data. The proposed work extends this idea to spatial domain as well as temp...
متن کاملAction recognition using length-variable edge trajectory and spatio-temporal motion skeleton descriptor
Representing the features of different types of human action in unconstrained videos is a challenging task due to camera motion, cluttered background, and occlusions. This paper aims to obtain effective and compact action representation with length-variable edge trajectory (LV-ET) and spatio-temporal motion skeleton (STMS). First, in order to better describe the long-term motion information for...
متن کاملOnline Human Action Detection Using Joint Classification-Regression Recurrent Neural Networks
Human action recognition from well-segmented 3D skeleton data has been intensively studied and attracting an increasing attention. Online action detection goes one step further and is more challenging, which identifies the action type and localizes the action positions on the fly from the untrimmed stream. In this paper, we study the problem of online action detection from the streaming skeleto...
متن کامل